Model With Minimal Translation Units, But Decode With Phrases
نویسندگان
چکیده
N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-grambased modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-the-art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.
منابع مشابه
Diversity driven attention model for query-based abstractive summarization
Abstractive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encodeattend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. ...
متن کاملTowards Neural Phrase-based Machine Translation
In this paper, we present Neural Phrase-based Machine Translation (NPMT). Our method explicitly models the phrase structures in output sequences using SleepWAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method. To mitigate the monotonic alignment requirement of SWAN, we introduce a new layer to perform (soft) local reordering of input sequences. Different from ex...
متن کاملTowards Neural Phrase-based Machine Translation
In this paper, we present Neural Phrase-based Machine Translation (NPMT). Our method explicitly models the phrase structures in output sequences using SleepWAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method. To mitigate the monotonic alignment requirement of SWAN, we introduce a new layer to perform (soft) local reordering of input sequences. Different from ex...
متن کاملModel in Word
Extracting bilingual dictionaries from corpora can be seen as a very fine-grained alignment process, where the aligned units are not paragraphs or sentences but words and phrases. Most approaches to this problem rely on statistical means to build translation lexicons from bilingual texts, roughly falling into two categories: the hypotheses testing approach and the estimating approach. There are...
متن کاملCorpus-Driven Study of Translation Units in an English-Chinese Parallel Corpus
It is widely acknowledged that texts are not translated word by word, but unit by unit. Single words are polysemous and therefore ambiguous in translation. Corpus linguistics, in monolingual context, has replaced the traditional basic notion of meaning (words) with the extended unit of meaning. Accordingly, this paper argues that in bilingual context, the translation unit, as the counterpart co...
متن کامل